Exploratory

Smoking vs. Age, Sex

(1): Smoking ~ age, sex

Is there a relationship between age and smoking status?

ANS: Yes, the proportion of smokers decreases with the age.

data %>%
  select(cursmoke, age) %>% 
  ggplot(aes(x = age, y = cursmoke)) +
  geom_jitter(height = 0.1, alpha = 0.1) +
  geom_smooth(lwd = 1.5) +
  theme_bw() 

Does this relationship differ by sex?

ANS: There is a higher proportion of smoker among men compared to women as both age ,but there is no interaction between age and sex.

data %>%
  select(cursmoke, age, sex) %>% 
  ggplot(aes(x = age, y = cursmoke, group = sex, color = sex)) +
  geom_jitter(height = 0.1, alpha = 0.1) +
  geom_smooth(lwd = 1.5) +
  theme_bw() 

(2) number of cigarettes ~ age, sex

Is there a relationship between the number of cigarettes smoked per day and age?

ANS: Yes, number of sigarets smoked per day stays constant for 30-50 years old and decreases with age after 50 years old.

All

data %>%
  select(cigpday, age) %>% 
  ggplot(aes(x = age, y = cigpday)) +
  geom_jitter(height = 0.1, alpha = 0.1) +
  geom_smooth(lwd = 1.5) +
  theme_bw() 
## Warning: Removed 79 rows containing non-finite values (stat_smooth).
## Warning: Removed 79 rows containing missing values (geom_point).

Smokers only

data %>%
  select(cigpday, age) %>% 
  filter(cigpday > 0) %>%
  ggplot(aes(x = age, y = cigpday)) +
  geom_jitter(height = 0.1, alpha = 0.1) +
  geom_smooth(lwd = 1.5) +
  theme_bw() 

Does this relationship differ by sex?

ANS: There is sex effect (men smoke higer number of sigarets per day than women across age), but there is no sex and age interaction.

All

data %>%
  select(cigpday, age, sex) %>% 
  ggplot(aes(x = age, y = cigpday, group = sex, color = sex)) +
  geom_jitter(height = 0.1, alpha = 0.1) +
  geom_smooth(lwd = 1.5) +
  theme_bw() 
## Warning: Removed 79 rows containing non-finite values (stat_smooth).
## Warning: Removed 79 rows containing missing values (geom_point).

Smokers only

data %>%
  select(cigpday, age, sex) %>% 
  filter(cigpday > 0) %>% 
  ggplot(aes(x = age, y = cigpday, group = sex, color = sex)) +
  geom_jitter(height = 0.1, alpha = 0.1) +
  geom_smooth(lwd = 1.5) +
  theme_bw() 

Smoking vs. health outcomes

(1) The relationship between current smoking status and systolic blood pressure.

smoking ~ sysbp

ANS: Proportion of smokers decreases with increase of systolic blood presure

data %>%
  select(cursmoke, sysbp) %>% 
  ggplot(aes(x = sysbp, y = cursmoke)) +
  geom_jitter(height = 0.1, alpha = 0.1) +
  geom_smooth(lwd = 1.5) +
  theme_bw() 

ANS: slightly higher sysbp for non-smokers

data %>%
  select(cursmoke, sysbp) %>% 
  mutate(cursmoke = factor(cursmoke)) %>%
  ggplot(aes(y = sysbp, x = cursmoke)) +
  geom_boxplot(outlier.colour = "white") +
  theme_bw() 

smoking ~ sysbp, sex

ANS: Proportion of smokers decreases with increase of systolic blood presure; the proportion is higher for men (sex effect).

data %>%
  select(cursmoke, sysbp, sex) %>% 
  ggplot(aes(x = sysbp, y = cursmoke, group = sex, color = sex)) +
  geom_jitter(height = 0.1, alpha = 0.1) +
  geom_smooth(lwd = 1.5) +
  theme_bw() 

ANS: no differences in sysbp between male and female smokers and non-smokers

data %>%
  select(cursmoke, sex, sysbp) %>% 
  mutate(cursmoke = factor(cursmoke),
           smoke_sex = interaction(cursmoke, sex)) %>% 
  ggplot(aes(y = sysbp, x = smoke_sex)) +
  geom_boxplot(outlier.colour = "white") +
  theme_bw() 

(2) The relationship between current smoking status and diastolic blood pressure.

smoking ~ diabp

ANS: Proportion of smokers decreases with increase of diastolic blood presure for BP=100 ad then proportion increases again (latter could be due to not enough data)

data %>%
  select(cursmoke, diabp) %>% 
  ggplot(aes(x = diabp, y = cursmoke)) +
  geom_jitter(height = 0.1, alpha = 0.1) +
  geom_smooth(lwd = 1.5) +
  theme_bw() 

ANS: no difference

data %>%
  select(cursmoke, diabp) %>% 
  mutate(cursmoke = factor(cursmoke)) %>%
  ggplot(aes(y = diabp, x = cursmoke)) +
  geom_boxplot(outlier.colour = "white") +
  theme_bw() 

smoking ~ diabp, sex

ANS: Proportion of smokers decreases with increase of diastolic blood presure; the proprtions are higher for men (sex effect).

data %>%
  select(cursmoke, diabp, sex) %>% 
  ggplot(aes(x = diabp, y = cursmoke, group = sex, color = sex)) +
  geom_jitter(height = 0.1, alpha = 0.1) +
  geom_smooth(lwd = 1.5) +
  theme_bw() 

ANS: no difference

data %>%
  select(cursmoke, sex, diabp) %>% 
  mutate(cursmoke = factor(cursmoke),
           smoke_sex = interaction(cursmoke, sex)) %>% 
  ggplot(aes(y = diabp, x = smoke_sex)) +
  geom_boxplot(outlier.colour = "white") +
  theme_bw() 

(3) The relationship between current smoking status and serum total cholesterol.

smoking ~ totchol

ANS: Proportion of smokers slightly decreases with increase of total cholesterol values

data %>%
  select(cursmoke, totchol) %>% 
  ggplot(aes(x = totchol, y = cursmoke)) +
  geom_jitter(height = 0.1, alpha = 0.1) +
  geom_smooth(lwd = 1.5) +
  theme_bw() 
## Warning: Removed 409 rows containing non-finite values (stat_smooth).
## Warning: Removed 409 rows containing missing values (geom_point).

ANS: no difference

data %>%
  select(cursmoke, totchol) %>% 
  mutate(cursmoke = factor(cursmoke)) %>%
  ggplot(aes(y = totchol, x = cursmoke)) +
  geom_boxplot(outlier.colour = "white") +
  theme_bw() 
## Warning: Removed 409 rows containing non-finite values (stat_boxplot).

smoking ~ totchol, sex [!!!]

ANS: Proportion of smokers hasnonlinera relationship with total cholesterol for women; proprtions increases with increase in total cholesterol for men (sex by totchol interaction effect).

data %>%
  select(cursmoke, totchol, sex) %>% 
  ggplot(aes(x = totchol, y = cursmoke, group = sex, color = sex)) +
  geom_jitter(height = 0.1, alpha = 0.1) +
  geom_smooth(lwd = 1.5) +
  theme_bw() 
## Warning: Removed 409 rows containing non-finite values (stat_smooth).
## Warning: Removed 409 rows containing missing values (geom_point).

ANS: no difference

data %>%
  select(cursmoke, sex, totchol) %>% 
  mutate(cursmoke = factor(cursmoke),
           smoke_sex = interaction(cursmoke, sex)) %>% 
  ggplot(aes(y = totchol, x = smoke_sex)) +
  geom_boxplot(outlier.colour = "white") +
  theme_bw() 
## Warning: Removed 409 rows containing non-finite values (stat_boxplot).

Missingness

library(dplyr)
prop <- round(colSums(is.na(data))/dim(data)[1], 3)

knitr::kable(sort(prop, decreasing = TRUE)[1:9], col.names = "Proportion of NAs")
Proportion of NAs
hdlc 0.740
ldlc 0.740
glucose 0.124
bpmeds 0.051
totchol 0.035
educ 0.025
cigpday 0.007
bmi 0.004
heartrte 0.001
#MISSING DATA ANALYSIS
VIM::aggr(data, prop=T, numbers=T)
## Warning in plot.aggr(res, ...): not enough vertical space to display
## frequencies (too many combinations)

mice::md.pattern(data)

##      randid sex age sysbp diabp cursmoke diabetes prevchd prevap prevmi
## 2236      1   1   1     1     1        1        1       1      1      1
## 7074      1   1   1     1     1        1        1       1      1      1
## 267       1   1   1     1     1        1        1       1      1      1
## 1         1   1   1     1     1        1        1       1      1      1
## 683       1   1   1     1     1        1        1       1      1      1
## 267       1   1   1     1     1        1        1       1      1      1
## 132       1   1   1     1     1        1        1       1      1      1
## 157       1   1   1     1     1        1        1       1      1      1
## 9         1   1   1     1     1        1        1       1      1      1
## 121       1   1   1     1     1        1        1       1      1      1
## 252       1   1   1     1     1        1        1       1      1      1
## 3         1   1   1     1     1        1        1       1      1      1
## 6         1   1   1     1     1        1        1       1      1      1
## 70        1   1   1     1     1        1        1       1      1      1
## 180       1   1   1     1     1        1        1       1      1      1
## 1         1   1   1     1     1        1        1       1      1      1
## 16        1   1   1     1     1        1        1       1      1      1
## 3         1   1   1     1     1        1        1       1      1      1
## 3         1   1   1     1     1        1        1       1      1      1
## 6         1   1   1     1     1        1        1       1      1      1
## 9         1   1   1     1     1        1        1       1      1      1
## 3         1   1   1     1     1        1        1       1      1      1
## 47        1   1   1     1     1        1        1       1      1      1
## 9         1   1   1     1     1        1        1       1      1      1
## 1         1   1   1     1     1        1        1       1      1      1
## 9         1   1   1     1     1        1        1       1      1      1
## 1         1   1   1     1     1        1        1       1      1      1
## 2         1   1   1     1     1        1        1       1      1      1
## 4         1   1   1     1     1        1        1       1      1      1
## 1         1   1   1     1     1        1        1       1      1      1
## 7         1   1   1     1     1        1        1       1      1      1
## 23        1   1   1     1     1        1        1       1      1      1
## 2         1   1   1     1     1        1        1       1      1      1
## 7         1   1   1     1     1        1        1       1      1      1
## 1         1   1   1     1     1        1        1       1      1      1
## 2         1   1   1     1     1        1        1       1      1      1
## 4         1   1   1     1     1        1        1       1      1      1
## 1         1   1   1     1     1        1        1       1      1      1
## 1         1   1   1     1     1        1        1       1      1      1
## 1         1   1   1     1     1        1        1       1      1      1
## 1         1   1   1     1     1        1        1       1      1      1
## 1         1   1   1     1     1        1        1       1      1      1
## 1         1   1   1     1     1        1        1       1      1      1
## 2         1   1   1     1     1        1        1       1      1      1
##           0   0   0     0     0        0        0       0      0      0
##      prevstrk prevhyp time period death angina hospmi mi_fchd anychd
## 2236        1       1    1      1     1      1      1       1      1
## 7074        1       1    1      1     1      1      1       1      1
## 267         1       1    1      1     1      1      1       1      1
## 1           1       1    1      1     1      1      1       1      1
## 683         1       1    1      1     1      1      1       1      1
## 267         1       1    1      1     1      1      1       1      1
## 132         1       1    1      1     1      1      1       1      1
## 157         1       1    1      1     1      1      1       1      1
## 9           1       1    1      1     1      1      1       1      1
## 121         1       1    1      1     1      1      1       1      1
## 252         1       1    1      1     1      1      1       1      1
## 3           1       1    1      1     1      1      1       1      1
## 6           1       1    1      1     1      1      1       1      1
## 70          1       1    1      1     1      1      1       1      1
## 180         1       1    1      1     1      1      1       1      1
## 1           1       1    1      1     1      1      1       1      1
## 16          1       1    1      1     1      1      1       1      1
## 3           1       1    1      1     1      1      1       1      1
## 3           1       1    1      1     1      1      1       1      1
## 6           1       1    1      1     1      1      1       1      1
## 9           1       1    1      1     1      1      1       1      1
## 3           1       1    1      1     1      1      1       1      1
## 47          1       1    1      1     1      1      1       1      1
## 9           1       1    1      1     1      1      1       1      1
## 1           1       1    1      1     1      1      1       1      1
## 9           1       1    1      1     1      1      1       1      1
## 1           1       1    1      1     1      1      1       1      1
## 2           1       1    1      1     1      1      1       1      1
## 4           1       1    1      1     1      1      1       1      1
## 1           1       1    1      1     1      1      1       1      1
## 7           1       1    1      1     1      1      1       1      1
## 23          1       1    1      1     1      1      1       1      1
## 2           1       1    1      1     1      1      1       1      1
## 7           1       1    1      1     1      1      1       1      1
## 1           1       1    1      1     1      1      1       1      1
## 2           1       1    1      1     1      1      1       1      1
## 4           1       1    1      1     1      1      1       1      1
## 1           1       1    1      1     1      1      1       1      1
## 1           1       1    1      1     1      1      1       1      1
## 1           1       1    1      1     1      1      1       1      1
## 1           1       1    1      1     1      1      1       1      1
## 1           1       1    1      1     1      1      1       1      1
## 1           1       1    1      1     1      1      1       1      1
## 2           1       1    1      1     1      1      1       1      1
##             0       0    0      0     0      0      0       0      0
##      stroke cvd hyperten timeap timemi timemifc timechd timestrk timecvd
## 2236      1   1        1      1      1        1       1        1       1
## 7074      1   1        1      1      1        1       1        1       1
## 267       1   1        1      1      1        1       1        1       1
## 1         1   1        1      1      1        1       1        1       1
## 683       1   1        1      1      1        1       1        1       1
## 267       1   1        1      1      1        1       1        1       1
## 132       1   1        1      1      1        1       1        1       1
## 157       1   1        1      1      1        1       1        1       1
## 9         1   1        1      1      1        1       1        1       1
## 121       1   1        1      1      1        1       1        1       1
## 252       1   1        1      1      1        1       1        1       1
## 3         1   1        1      1      1        1       1        1       1
## 6         1   1        1      1      1        1       1        1       1
## 70        1   1        1      1      1        1       1        1       1
## 180       1   1        1      1      1        1       1        1       1
## 1         1   1        1      1      1        1       1        1       1
## 16        1   1        1      1      1        1       1        1       1
## 3         1   1        1      1      1        1       1        1       1
## 3         1   1        1      1      1        1       1        1       1
## 6         1   1        1      1      1        1       1        1       1
## 9         1   1        1      1      1        1       1        1       1
## 3         1   1        1      1      1        1       1        1       1
## 47        1   1        1      1      1        1       1        1       1
## 9         1   1        1      1      1        1       1        1       1
## 1         1   1        1      1      1        1       1        1       1
## 9         1   1        1      1      1        1       1        1       1
## 1         1   1        1      1      1        1       1        1       1
## 2         1   1        1      1      1        1       1        1       1
## 4         1   1        1      1      1        1       1        1       1
## 1         1   1        1      1      1        1       1        1       1
## 7         1   1        1      1      1        1       1        1       1
## 23        1   1        1      1      1        1       1        1       1
## 2         1   1        1      1      1        1       1        1       1
## 7         1   1        1      1      1        1       1        1       1
## 1         1   1        1      1      1        1       1        1       1
## 2         1   1        1      1      1        1       1        1       1
## 4         1   1        1      1      1        1       1        1       1
## 1         1   1        1      1      1        1       1        1       1
## 1         1   1        1      1      1        1       1        1       1
## 1         1   1        1      1      1        1       1        1       1
## 1         1   1        1      1      1        1       1        1       1
## 1         1   1        1      1      1        1       1        1       1
## 1         1   1        1      1      1        1       1        1       1
## 2         1   1        1      1      1        1       1        1       1
##           0   0        0      0      0        0       0        0       0
##      timedth timehyp heartrte bmi cigpday educ totchol bpmeds glucose hdlc
## 2236       1       1        1   1       1    1       1      1       1    1
## 7074       1       1        1   1       1    1       1      1       1    0
## 267        1       1        1   1       1    1       1      1       0    1
## 1          1       1        1   1       1    1       1      1       0    1
## 683        1       1        1   1       1    1       1      1       0    0
## 267        1       1        1   1       1    1       1      0       1    1
## 132        1       1        1   1       1    1       1      0       1    0
## 157        1       1        1   1       1    1       1      0       0    1
## 9          1       1        1   1       1    1       1      0       0    0
## 121        1       1        1   1       1    1       0      1       1    0
## 252        1       1        1   1       1    1       0      1       0    0
## 3          1       1        1   1       1    1       0      0       1    0
## 6          1       1        1   1       1    1       0      0       0    0
## 70         1       1        1   1       1    0       1      1       1    1
## 180        1       1        1   1       1    0       1      1       1    0
## 1          1       1        1   1       1    0       1      1       0    1
## 16         1       1        1   1       1    0       1      1       0    0
## 3          1       1        1   1       1    0       1      0       1    1
## 3          1       1        1   1       1    0       1      0       1    0
## 6          1       1        1   1       1    0       0      1       1    0
## 9          1       1        1   1       1    0       0      1       0    0
## 3          1       1        1   1       0    1       1      1       1    1
## 47         1       1        1   1       0    1       1      1       1    0
## 9          1       1        1   1       0    1       1      1       0    0
## 1          1       1        1   1       0    1       1      0       1    0
## 9          1       1        1   1       0    1       1      0       0    1
## 1          1       1        1   1       0    1       0      1       1    0
## 2          1       1        1   1       0    1       0      1       0    0
## 4          1       1        1   1       0    0       1      1       1    0
## 1          1       1        1   1       0    0       0      1       0    0
## 7          1       1        1   0       1    1       1      1       1    1
## 23         1       1        1   0       1    1       1      1       1    0
## 2          1       1        1   0       1    1       1      1       0    1
## 7          1       1        1   0       1    1       1      1       0    0
## 1          1       1        1   0       1    1       1      0       1    1
## 2          1       1        1   0       1    1       0      1       1    0
## 4          1       1        1   0       1    1       0      1       0    0
## 1          1       1        1   0       1    0       1      1       1    1
## 1          1       1        1   0       1    0       1      1       1    0
## 1          1       1        0   1       1    1       1      1       1    0
## 1          1       1        0   1       1    1       0      1       0    0
## 1          1       1        0   0       1    1       1      1       0    0
## 1          1       1        0   0       1    1       0      1       0    0
## 2          1       1        0   0       0    1       1      0       0    1
##            0       0        6  52      79  295     409    593    1440 8600
##      ldlc      
## 2236    1     0
## 7074    0     2
## 267     1     1
## 1       0     2
## 683     0     3
## 267     1     1
## 132     0     3
## 157     1     2
## 9       0     4
## 121     0     3
## 252     0     4
## 3       0     4
## 6       0     5
## 70      1     1
## 180     0     3
## 1       1     2
## 16      0     4
## 3       1     2
## 3       0     4
## 6       0     4
## 9       0     5
## 3       1     1
## 47      0     3
## 9       0     4
## 1       0     4
## 9       1     3
## 1       0     4
## 2       0     5
## 4       0     4
## 1       0     6
## 7       1     1
## 23      0     3
## 2       1     2
## 7       0     4
## 1       1     2
## 2       0     4
## 4       0     5
## 1       1     2
## 1       0     4
## 1       0     3
## 1       0     5
## 1       0     5
## 1       0     6
## 2       1     5
##      8601 20075
# HDLC and LDLC have a lot of missing data, and these variables should be eliminated

prob.data <- data %>%
  group_by(period) %>% 
  summarise(sysbp_prob = sum(sysbp, na.rm = TRUE)/n())
prob.data
table(data$period)
## 
##    1    2    3 
## 4434 3930 3263

Models

library(lme4)
library(dplyr)
  1. Is there a relationship between age and smoking status? Does this relationship differ by sex?
summary(gee::gee(cursmoke ~ age + sex + as.factor(educ) + bmi + diabetes+ heartrte + prevchd + prevstrk + prevhyp + timedth, id = randid, family=binomial, corstr = "unstructured"))
##      (Intercept)              age              sex as.factor(educ)2 
##     6.0056791362    -0.0570318456    -0.7246241605     0.0503157656 
## as.factor(educ)3 as.factor(educ)4              bmi         diabetes 
##    -0.2263406725    -0.2104931713    -0.0940121564    -0.1114780579 
##         heartrte          prevchd         prevstrk          prevhyp 
##     0.0173911758    -0.0569530253    -0.2633153119    -0.2252617826 
##          timedth 
##    -0.0001021991
## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Logit 
##  Variance to Mean Relation: Binomial 
##  Correlation Structure:     Unstructured 
## 
## Call:
## gee::gee(formula = cursmoke ~ age + sex + as.factor(educ) + bmi + 
##     diabetes + heartrte + prevchd + prevstrk + prevhyp + timedth, 
##     id = randid, family = binomial, corstr = "unstructured")
## 
## Summary of Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.8962482 -0.3999547 -0.2019900  0.4654084  0.9569903 
## 
## 
## Coefficients:
##                       Estimate   Naive S.E.     Naive z  Robust S.E.
## (Intercept)       5.865249e+00 2.876839e-01  20.3878246 2.918503e-01
## age              -5.085698e-02 2.523157e-03 -20.1560927 2.409602e-03
## sex              -7.161378e-01 5.853646e-02 -12.2340470 6.021884e-02
## as.factor(educ)2  8.329407e-02 6.878131e-02   1.2109985 7.132044e-02
## as.factor(educ)3 -1.824354e-01 8.341765e-02  -2.1870119 8.622144e-02
## as.factor(educ)4 -2.120259e-01 9.400901e-02  -2.2553781 9.703148e-02
## bmi              -8.381699e-02 6.455452e-03 -12.9839063 6.897218e-03
## diabetes         -6.038094e-02 9.758132e-02  -0.6187756 1.015290e-01
## heartrte          8.932497e-03 1.404441e-03   6.3601797 1.481180e-03
## prevchd          -2.590671e-01 7.830011e-02  -3.3086429 9.019862e-02
## prevstrk         -1.886466e-01 1.712697e-01  -1.1014592 1.758071e-01
## prevhyp          -4.865095e-02 4.122989e-02  -1.1799922 4.099808e-02
## timedth          -8.971112e-05 1.418953e-05  -6.3223476 1.414685e-05
##                     Robust z
## (Intercept)       20.0967721
## age              -21.1059707
## sex              -11.8922537
## as.factor(educ)2   1.1678849
## as.factor(educ)3  -2.1158935
## as.factor(educ)4  -2.1851244
## bmi              -12.1522886
## diabetes          -0.5947164
## heartrte           6.0306610
## prevchd           -2.8721845
## prevstrk          -1.0730321
## prevhyp           -1.1866640
## timedth           -6.3414186
## 
## Estimated Scale Parameter:  0.9748127
## Number of Iterations:  4
## 
## Working Correlation
##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.7563191 0.5136670
## [2,] 0.7563191 1.0000000 0.5713635
## [3,] 0.5136670 0.5713635 1.0000000
  1. Is there a relationship between the number of cigarettes smoked per day and age? Does this relationship differ by sex?
summary(gee::gee(cigpday ~ age + sex + as.factor(educ),  id = randid, corstr = "unstructured")) 
##      (Intercept)              age              sex as.factor(educ)2 
##       32.8474189       -0.2787825       -5.9230065        0.7894811 
## as.factor(educ)3 as.factor(educ)4 
##       -0.7873846       -0.9535842
## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Identity 
##  Variance to Mean Relation: Gaussian 
##  Correlation Structure:     Unstructured 
## 
## Call:
## gee::gee(formula = cigpday ~ age + sex + as.factor(educ), id = randid, 
##     corstr = "unstructured")
## 
## Summary of Residuals:
##        Min         1Q     Median         3Q        Max 
## -16.938455  -7.957787  -4.031599   7.120862  77.643497 
## 
## 
## Coefficients:
##                    Estimate Naive S.E.    Naive z Robust S.E.   Robust z
## (Intercept)      28.3465052 0.87303381  32.468966  0.88378195  32.074094
## age              -0.1969624 0.01219203 -16.155019  0.01052473 -18.714256
## sex              -6.0131712 0.31430105 -19.131884  0.33919691 -17.727671
## as.factor(educ)2  1.1048806 0.37423991   2.952332  0.39926096   2.767314
## as.factor(educ)3 -0.4829763 0.45051511  -1.072054  0.43622459  -1.107174
## as.factor(educ)4 -0.7195971 0.51224189  -1.404799  0.55221861  -1.303102
## 
## Estimated Scale Parameter:  133.1198
## Number of Iterations:  3
## 
## Working Correlation
##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.7707103 0.4613932
## [2,] 0.7707103 1.0000000 0.5843157
## [3,] 0.4613932 0.5843157 1.0000000

While answering these questions, please account for any confounders that you have evidence may impact the relationship between age and sex with smoking.

Next you are interested in the relationship between certain health outcomes and smoking status. In particular you are interested in:

  1. The relationship between current smoking status and systolic blood pressure.
summary(gee::gee(sysbp ~ cursmoke + age + sex + as.factor(educ), id = randid, na.action = "na.omit", corstr = "unstructured"))
##      (Intercept)         cursmoke              age              sex 
##       87.8024330       -2.0414490        0.8846592        1.2687106 
## as.factor(educ)2 as.factor(educ)3 as.factor(educ)4 
##       -0.3063701       -2.9072926       -4.0416336
## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Identity 
##  Variance to Mean Relation: Gaussian 
##  Correlation Structure:     Unstructured 
## 
## Call:
## gee::gee(formula = sysbp ~ cursmoke + age + sex + as.factor(educ), 
##     id = randid, na.action = "na.omit", corstr = "unstructured")
## 
## Summary of Residuals:
##        Min         1Q     Median         3Q        Max 
## -56.456788 -14.552902  -2.892551  11.072266 147.850985 
## 
## 
## Coefficients:
##                    Estimate Naive S.E.   Naive z Robust S.E.  Robust z
## (Intercept)      88.9127308 1.71558828 51.826380  1.65360529 53.769017
## cursmoke         -1.5149317 0.46476110 -3.259592  0.45715635 -3.313815
## age               0.8653711 0.02377869 36.392712  0.02294929 37.707974
## sex               1.4262673 0.54709991  2.606959  0.57967657  2.460454
## as.factor(educ)2 -0.7481716 0.64772860 -1.155070  0.69947330 -1.069621
## as.factor(educ)3 -3.1296696 0.77855475 -4.019845  0.82616759 -3.788178
## as.factor(educ)4 -4.3249661 0.88481499 -4.887989  0.91529976 -4.725191
## 
## Estimated Scale Parameter:  438.277
## Number of Iterations:  3
## 
## Working Correlation
##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.5844383 0.3901881
## [2,] 0.5844383 1.0000000 0.4769521
## [3,] 0.3901881 0.4769521 1.0000000
  1. The relationship between current smoking status and diastolic blood pressure.
summary(gee::gee(diabp ~ cursmoke + age + sex + as.factor(educ), id = randid, na.action = "na.omit", corstr = "unstructured"))
##      (Intercept)         cursmoke              age              sex 
##      83.14931183      -1.76253559       0.06074128      -1.49015111 
## as.factor(educ)2 as.factor(educ)3 as.factor(educ)4 
##      -0.07960126      -0.96361582      -1.38809040
## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Identity 
##  Variance to Mean Relation: Gaussian 
##  Correlation Structure:     Unstructured 
## 
## Call:
## gee::gee(formula = diabp ~ cursmoke + age + sex + as.factor(educ), 
##     id = randid, na.action = "na.omit", corstr = "unstructured")
## 
## Summary of Residuals:
##        Min         1Q     Median         3Q        Max 
## -53.815474  -7.934818  -1.165221   6.612862  66.053172 
## 
## 
## Coefficients:
##                     Estimate Naive S.E.    Naive z Robust S.E.   Robust z
## (Intercept)      83.72716018 0.94598765 88.5076670  0.92664562 90.3551025
## cursmoke         -1.25577390 0.25866115 -4.8548996  0.26295278 -4.7756631
## age               0.04492311 0.01318527  3.4070687  0.01301711  3.4510820
## sex              -1.37262875 0.29612384 -4.6353199  0.31463807 -4.3625640
## as.factor(educ)2 -0.20708873 0.35071588 -0.5904743  0.38142348 -0.5429365
## as.factor(educ)3 -1.00717792 0.42133624 -2.3904374  0.43642586 -2.3077870
## as.factor(educ)4 -1.50398297 0.47863392 -3.1422407  0.49916228 -3.0130140
## 
## Estimated Scale Parameter:  134.5107
## Number of Iterations:  3
## 
## Working Correlation
##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.5561530 0.3490568
## [2,] 0.5561530 1.0000000 0.3838573
## [3,] 0.3490568 0.3838573 1.0000000
  1. The relationship between current smoking status and serum total cholesterol.
summary(gee::gee(totchol ~ cursmoke + age + sex + as.factor(educ), id = randid, na.action = "na.omit", corstr = "unstructured"))
##      (Intercept)         cursmoke              age              sex 
##      180.5972060        1.6590353        0.7348726       12.0862772 
## as.factor(educ)2 as.factor(educ)3 as.factor(educ)4 
##        0.9107182        2.0078913        0.6363909
## 
##  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
##  gee S-function, version 4.13 modified 98/01/27 (1998) 
## 
## Model:
##  Link:                      Identity 
##  Variance to Mean Relation: Gaussian 
##  Correlation Structure:     Unstructured 
## 
## Call:
## gee::gee(formula = totchol ~ cursmoke + age + sex + as.factor(educ), 
##     id = randid, na.action = "na.omit", corstr = "unstructured")
## 
## Summary of Residuals:
##         Min          1Q      Median          3Q         Max 
## -148.090340  -29.650053   -2.060511   27.146185  399.388552 
## 
## 
## Coefficients:
##                     Estimate Naive S.E.    Naive z Robust S.E.   Robust z
## (Intercept)      179.4752160 3.62072716 49.5688319  3.65765697 49.0683565
## cursmoke           2.2459560 0.97321653  2.3077660  0.99901818  2.2481633
## age                0.7164736 0.04953822 14.4630466  0.04903112 14.6126292
## sex               12.7309870 1.18566784 10.7373976  1.22285054 10.4109101
## as.factor(educ)2   1.0663777 1.40446547  0.7592766  1.49560440  0.7130079
## as.factor(educ)3   2.1301734 1.68938577  1.2609159  1.78623276  1.1925509
## as.factor(educ)4   0.9692929 1.91708122  0.5056087  1.85486438  0.5225681
## 
## Estimated Scale Parameter:  1956.654
## Number of Iterations:  3
## 
## Working Correlation
##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.6715907 0.4279521
## [2,] 0.6715907 1.0000000 0.4618901
## [3,] 0.4279521 0.4618901 1.0000000
model_1<-glmer(cursmoke~ age + sex + sysbp + diabp + totchol + as.factor(educ) + (1|randid), family=binomial, na.action = "na.omit")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl =
## control$checkConv, : Model failed to converge with max|grad| = 1.19128 (tol
## = 0.001, component 1)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
##  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
##  - Rescale variables?
knitr::kable(summary(model_1)$coefficients,digits = 3)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 16.730 0.965 17.332 0.000
age -0.206 0.010 -20.134 0.000
sex -3.760 0.324 -11.597 0.000
sysbp -0.001 0.005 -0.193 0.847
diabp -0.027 0.008 -3.341 0.001
totchol 0.005 0.002 3.029 0.002
as.factor(educ)2 0.922 0.284 3.245 0.001
as.factor(educ)3 -0.330 0.326 -1.014 0.311
as.factor(educ)4 -0.481 0.372 -1.293 0.196
glmer(cursmoke~ age + sex + totchol + as.factor(educ) + bmi + diabetes+ heartrte + prevchd + prevstrk + prevhyp + timedth + (1|randid), family=binomial, na.action = "na.omit") %>% knitr::kable(summary(.)$coefficients,digits = 3) # too many covariates to converge

REPORT

What to Turn In: For this assignment you will turn in a single pdf document with (a) your summary of findings and (b) an Appendix with figures and (c) an Appendix with all R code (in the form of a knited pdf).

OBJECTIVE:

An objective or description of the goals of the analysis

We are interested to describe the smoking habits of the participants in the Framingham Heart study as they age and the impact of smoking on certain health outcomes. The Framingham heart study asks participants about their smoking habits at each visit. In particular, participants are asked if they are currently smoking at this visit (0 = Not a current smoker, 1 = Current smoker), which we will refer to as current smoking status. In addition, participants also report the number of cigarettes they are smoking per day. A more complete description of each of variables in the Framingham Heart study can be found in the Framingham Heart Study Longitudinal Data Documentation.

We are interested to answer the following questions:

  1. Is there a relationship between age and smoking status? Does this relationship differ by sex?

  2. Is there a relationship between the number of cigarettes smoked per day and age? Does this relationship differ by sex?

  3. The relationship between current smoking status and systolic blood pressure.

  4. The relationship between current smoking status and diastolic blood pressure.

  5. The relationship between current smoking status and serum total cholesterol.

STUDY DESIGN:

A brief description of the study design and the data

METHODS:

A methods section describing your statistical analysis (please justify all modeling choices that were made with evidence).

TO INCLUDE:

age - we want to see how smoking status changes as we get older
sex - exploratory data analysis showed there were differences
education - found in the literature
total cholesterol
BMI - confounder based on literature
diabetes (https://www.cdc.gov/tobacco/data_statistics/sgr/50th-anniversary/pdfs/fs_smoking_diabetes_508.pdf)
heart rate - smoking increases heart rate (in literature)
prevchd since prevmi and prevap are correlaed
prevstroke
prevhyp
timedth

DONT INCLUDE: systolic, diastolic bc we have hypertension
BPmeds - not significant
LDL and HDL have too many NAs
glucose - correlated with diabetes

RESULTS:

A results section that includes a) descriptive statistics for the data b) a summary of your key findings including supporting numerical summaries (i.e. confidence intervals, pvalues, etc.) c) interpretations of your key findings (i.e. interpretations of coefficients).

CONCLUSION:

A conclusion specifically answering the objective of the analysis.

APPENDIX: